181 research outputs found

    Fibonacci Binning

    Full text link
    This note argues that when dot-plotting distributions typically found in papers about web and social networks (degree distributions, component-size distributions, etc.), and more generally distributions that have high variability in their tail, an exponentially binned version should always be plotted, too, and suggests Fibonacci binning as a visually appealing, easy-to-use and practical choice

    Broadword Implementation of Parenthesis Queries

    Full text link
    We continue the line of research started in "Broadword Implementation of Rank/Select Queries" proposing broadword (a.k.a. SWAR, "SIMD Within A Register") algorithms for finding matching closed parentheses and the k-th far closed parenthesis. Our algorithms work in time O(log w) on a word of w bits, and contain no branch and no test instruction. On 64-bit (and wider) architectures, these algorithms make it possible to avoid costly tabulations, while providing a very significant speedup with respect to for-loop implementations

    Supremum-Norm Convergence for Step-Asynchronous Successive Overrelaxation on M-matrices

    Full text link
    Step-asynchronous successive overrelaxation updates the values contained in a single vector using the usual Gau\ss-Seidel-like weighted rule, but arbitrarily mixing old and new values, the only constraint being temporal coherence: you cannot use a value before it has been computed. We show that given a nonnegative real matrix AA, a σρ(A)\sigma\geq\rho(A) and a vector w>0\boldsymbol w>0 such that AwσwA\boldsymbol w\leq\sigma\boldsymbol w, every iteration of step-asynchronous successive overrelaxation for the problem (sIA)x=b(sI- A)\boldsymbol x=\boldsymbol b, with s>σs >\sigma, reduces geometrically the w\boldsymbol w-norm of the current error by a factor that we can compute explicitly. Then, we show that given a σ>ρ(A)\sigma>\rho(A) it is in principle always possible to compute such a w\boldsymbol w. This property makes it possible to estimate the supremum norm of the absolute error at each iteration without any additional hypothesis on AA, even when AA is so large that computing the product AxA\boldsymbol x is feasible, but estimating the supremum norm of (sIA)1(sI-A)^{-1} is not

    Stanford Matrix Considered Harmful

    Get PDF
    This note argues about the validity of web-graph data used in the literature

    An experimental exploration of Marsaglia's xorshift generators, scrambled

    Full text link
    Marsaglia proposed recently xorshift generators as a class of very fast, good-quality pseudorandom number generators. Subsequent analysis by Panneton and L'Ecuyer has lowered the expectations raised by Marsaglia's paper, showing several weaknesses of such generators, verified experimentally using the TestU01 suite. Nonetheless, many of the weaknesses of xorshift generators fade away if their result is scrambled by a non-linear operation (as originally suggested by Marsaglia). In this paper we explore the space of possible generators obtained by multiplying the result of a xorshift generator by a suitable constant. We sample generators at 100 equispaced points of their state space and obtain detailed statistics that lead us to choices of parameters that improve on the current ones. We then explore for the first time the space of high-dimensional xorshift generators, following another suggestion in Marsaglia's paper, finding choices of parameters providing periods of length 2102412^{1024} - 1 and 2409612^{4096} - 1. The resulting generators are of extremely high quality, faster than current similar alternatives, and generate long-period sequences passing strong statistical tests using only eight logical operations, one addition and one multiplication by a constant

    Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics

    Full text link
    Minimal-interval semantics associates with each query over a document a set of intervals, called witnesses, that are incomparable with respect to inclusion (i.e., they form an antichain): witnesses define the minimal regions of the document satisfying the query. Minimal-interval semantics makes it easy to define and compute several sophisticated proximity operators, provides snippets for user presentation, and can be used to rank documents. In this paper we provide algorithms for computing conjunction and disjunction that are linear in the number of intervals and logarithmic in the number of operands; for additional operators, such as ordered conjunction and Brouwerian difference, we provide linear algorithms. In all cases, space is linear in the number of operands. More importantly, we define a formal notion of optimal laziness, and either prove it, or prove its impossibility, for each algorithm. We cast our results in a general framework of antichains of intervals on total orders, making our algorithms directly applicable to other domains.Comment: 24 pages, 4 figures. A preliminary (now outdated) version was presented at SPIRE 200

    Four Degrees of Separation, Really

    Full text link
    We recently measured the average distance of users in the Facebook graph, spurring comments in the scientific community as well as in the general press ("Four Degrees of Separation"). A number of interesting criticisms have been made about the meaningfulness, methods and consequences of the experiment we performed. In this paper we want to discuss some methodological aspects that we deem important to underline in the form of answers to the questions we have read in newspapers, magazines, blogs, or heard from colleagues. We indulge in some reflections on the actual meaning of "average distance" and make a number of side observations showing that, yes, 3.74 "degrees of separation" are really few
    corecore